Please read the following material before class starts.
Based on the reading material above, please answer the following questions and collect the answers via Google Classroom.
How many data types and structures are there in the R programming language?
State the type of control flow?
What do the R Packages mean?
R is a popular programming language for statistics and data analytics. In this module we will learn the basics of R programming for data analytics.
Like other programming languages in general, R can perform mathematical operations such as: addition, subtraction, multiplication, and etc.
# Addition
5 + 5# Subtraction
5 - 5# Multiplication
3 * 5# Division
5 / 2# Power
2^5# Modulo
28 %% 6R allows us to store values as variables. To do this, we can use the command <- or =
# Example of store value in variable with "<-"
example_variable <- 4The value stored in a variable is not immediately displayed. We have to use the command print(varible_name) to display it.
# View the value in a variable with print()
print(example_variable)## [1] 4
The following is an example of storing values in a varible using `=
# Example of store value in variable with "="
example_variable_2 <- 2
# Print Variable
print(example_variable_2)## [1] 2
Both commands are working fine. However, in the R community it is more common to use <-
R has 5 main data types namely:
# Examples of numeric data type
n <- 1.2To see the data type we can use class(variable_name) function
# View data types of n
class(n)## [1] "numeric"
To see the value of the data we can use print(variable_name) function, as we have learned before
# View data values
print(n)## [1] 1.2
# Examples of integer data type
i <- 2L# Examples of character data type
c <- "MBA ITB"# Examples of logical data type
l <- TRUE# Examples of complex data type
com <- 3 + 2iR has 5 main data sctructure namely:
# Example of Vector
v <- c("banana", "apple", "tomato")# Example of Vector
li <- list(1, "a", TRUE, 1 + 4i)# Example of Matrix
m <- matrix(c(1:3))# Example of Factor
f <- factor(c(
"Male",
"Female",
"Female",
"Male",
"Male",
"Female"
))df_student <- data.frame(
name = c("Dito", "Dian", "Rosidi"),
age = c(22, 23, 27),
sex = c("Male", "Female", "Male")
)A function is a set of command lines or code arranged together to perform a specific task. In R, we can use built-in functions or create new ones.
# Creating Age Vectors
age <- c(22, 23, 21, 8, 10, 14, 15)# Calculates the average age
mean(age)## [1] 16.14286
# Create a function to calculate Return of Investment
roi <- function(profit, cost_of_investment) {
return((profit / cost_of_investment) * 0.1)
}# Using the ROI function that has been created
roi(profit = 175, cost_of_investment = 75)## [1] 0.2333333
R Package is a collection of functions / functions or data in R. There are many packages created / developed by other users. To be able to use the existing functions in a package, we must install and load the package first. The following are examples of packages that are commonly used.
The readxl package is useful for importing data in .xlsx or .xls format into the R environment.
# Install package
install.packages("readxl")# Load package
library(readxl)# Import .xlsx data
df_xlsx <- read_xls("data/Superstore.xls")# View Data
df_xlsxThe readr package is useful for importing data in .csv format or other text formats into the R environment.
# Install package
install.packages("readr")# Load package
library(readr)# Import .csv data
df_csv <- read_csv("data/Superstore.csv")# View Data
df_csvDplyr is an R package that is widely used for data transformation and exploration.
# Install package
install.packages("dplyr")# Load package
library(dplyr)## Warning: package 'dplyr' was built under R version 4.0.4
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The following are some examples of common functions from the dplyr package.
The first function is select() which is used to select a particular column.
# Select variabel "order_date" dan "customer_name"
df_select <- select(df_csv, c("order_date", "customer_name"))
# Show the first 6 row
head(df_select)The next function is the filter() function which functions to filter data based on a condition.
# Filter segment = Corporate
df_filter <- filter(df_csv, segment == "Corporate")
# Show the first 6 row
head(df_filter)The mutate() function can be used to create new variables.
# Create a new variable using the mutate()
df_mutate <- mutate(df_csv, cost = sales - profit)
# Show the first 6 row
head(df_mutate)The summarise() function is used to summarize several data values into a single value. This function will be very useful when combined with other functions in dplyr. The summarise functions that can be used include mean(), median(), sd(), min(), max(), quantile(), first(), last (). For example, we will calculate the average sales.
# Calculate the average sales
df_summarise <- summarise(df_csv, mean(sales))
# Show calculation results
print(df_summarise)## # A tibble: 1 x 1
## `mean(sales)`
## <dbl>
## 1 230.
The arrange() function can be used to sort values by a column.
# Sort data by "profit"
df_arrange <- arrange(df_csv, profit)
# Show the first 6 row
head(df_arrange)The pipe function denoted by the notation %>% is used to create a continuous function.
# Finding the average sales rate for the corporate segment
df_pipe <- df_csv %>%
filter(segment == "Corporate") %>%
summarise(mean(sales))
# Show calculation results
print(df_pipe)## # A tibble: 1 x 1
## `mean(sales)`
## <dbl>
## 1 234.
Create a dataframe containing information from 10 of your classmates (Name, Age, Gender, Hobbies)
Create a function to calculate the Return on Equity
Find out the 5 R packages and explain their functions/purpose